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Abstract 

In contingency table analysis, the odds ratio is a commonly applied measure used to 
summarize the degree of association between two categorical variables, say R and S. Sup- 
pose now that for each individual in the table, a vector of continuous variables X is also 
observed. It is then vital to analyze whether and how the degree of association varies with 
X. In this work, we extend the classical odds ratio to the conditional case, and develop 
nonparametric estimators of this "pointwise odds ratio" to summarize the strength of lo- 
cal association between R and S given X. To allow for maximum flexibility, we make 
this extension using kernel regression. We develop confidence intervals based on these 
nonparametric estimators. We demonstrate via simulation that our pointwise odds ratio 
estimators can outperform model-based counterparts from logistic regression and GAMs, 
without the need for a linearity or additivity assumption. Finally, we illustrate its applica- 
tion to a dataset of patients from an intensive care unit (ICU), offering a greater insight into 
how the association between survival of patients admitted for emergency versus elective 
reasons varies with the patients' ages. 
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1 Introduction 



Consider a two-way contingency table with row and column variables R and S, having levels 
i = 1, . . . , r and j = 1, . . . , s respectively. A commonly used measure to summarize the 
degree of association between R and S is the odds ratio. In the case of r = s = 2, the odds 
ratio exhibits the simple form 

OR= P -^ (1) 
P12P21 

where = P(R = i, S = j). A sample estimate of OR is obtained by replacing p^- with the 
observed sample proportions p^ = riij/n. Due to its intuitive interpretation in terms of odds 
and conditional probabilities, OR is often used in general r x s tables also, generating a set of 
odds ratios ( |Agresti[ |20"02| Chapter 2). 



Suppose now that for each observation making up the table, a vector of continuous covariates X 



is also observed. As a motivating example, we consider a dataset from Hosmer and Lemeshow 



(2000), comprising 200 patients discharged from an adult intensive care unit (ICU). The data 
is cross-classified into survival status following hospital discharge (0 = Lived; 1 = Died) and 
type of admission into ICU (0 = Elective; 1 = Emergency), as shown in Table [T] Along with 
these two variables, the age of each patient at the time of admission was also recorded. We 
are interested in seeing whether and how the association between survival and admission type 
varies according to age. More generally, we want to quantify the degree of local association 
between R and S conditional on X = x. 

A traditional method for accomplishing this involves discretizing X into several levels, and 
considering the odds ratio in each partial table ( Ahrens and Pigeotl 2006). This technique 



however does not preserve the continuous nature of X (age), resulting in a potential loss of 
information. A more commonly applied method is a model-based one, utilizing the odds ratio 
resulting from the logistic regression model below, 

log ( - Pl ) = 0o + foxi + fan + P 3 Xiri, (2) 
V 1 -Pi) 

where p, = P(S — 1\R = r^X = Xj) is the conditional probability of 'success' for the i th 
observation. The local odds ratio is then given by OR(x) = e^ 2+ ^ x \ For a general r x s table, 



an extension can be made using polytomous response regression (Agresti, 2002[ Chapter 7)). 



However, since these odds ratios are by-products of Generalized Linear Models (GLMs, Mc 



Cullagh and Nelder 1989| ), they incur the problems associated with parametric regression. The 
logit linearity assumption means these measures lack flexibility and risk model mis-specification. 
For instance, it is clear from ([2]) that there is an overly strict demand for the odds ratio to be 
increasing or decreasing in an exponential manner over X. 

To introduce greater flexibility, a commonplace alternative is to utilize a Generalized Additive 
Models (GAMs, Ha stie and Tibshirani[ [T990) instead: 



log 



Pi 



Pi 



A) + fi{xi) + 0in + rif 2 (xi), 



(3) 



where fi(-) and / 2 (-) are two separate smoothers of x. Equivalently, ^ can also be regarded as 
a varying coefficient model (Hastie and Tibshirani 1993| ). In fact, this model nonparametrically 
fits two separate curves, one for each level of r, and the log odds ratio estimate is obtained from 
the difference of these two curves ( |Hastie and Tibshirani , 199QJ ). Using GAMs to estimate local 
odds ratios has been considered before by Zhao et al ( 1996| ); Figueiras and Cadarso-Suarez 



(2001 ) amongst others, although their motivation stemmed from a regression context and thus 
considered R as continuous also. Cadarso-Suarez et al ( 2005| ) proposed estimation of odds 
ratios using GAMs with unknown link functions, but their developments were again limited 
to R continuous. Additionally, their simulations only considered datasets of size n = 1000, 
meaning performance is not assessed for low to moderate sample sizes. 

In contrast, as reflected in the ICU dataset example, our motivation arises from analyses of 
contingency tables. We seek a flexible measure of local association that is not model-based in 
any sense. 

In this paper, we propose a fully nonparametric measure of conditional association, formed by 
extending the global OR to the local case. By exploiting the flexibility of kernel regression, 
our "pointwise odds ratio" permits a continuous X, while avoiding the hazards of model mis- 
specification. Using kernel regression to estimate the pointwise odds ratio was first suggested 
by Geenens and Simar| ( 2010| ), although it was not explored in any depth there. This idea was 



also independently proposed by Chen et al (2011), although our work explores the problem 



much more thoroughly. Specifically, we propose adjusted estimators of the pointwise log odds 
ratio which have better statistical properties compared to a basic plug-in approach. We also 
develop confidence intervals for these new estimators. Applying these methods to the ICU 
dataset, we are able to gain a more nuanced view of the underlying relationships between age, 
type of admission, and survival status. 



2 The pointwise odds ratio 

The pointwise odds ratio is an intuitive extension of the global odds ratio defined in ([T]), formed 
using the conditional probabilities Pij(x) = P(R = i, S = j\X = x), 

OR(x) = V* eSx , (4) 

Pl2{X)P2l{x) 

for r = s = 2. Equation Q can be broadened to produce a set of pointwise odds ratios for a 
general r x s table, but we restrict developments here to the simplest case. Also for simplicity 
here, we restrict attention to univariate X, with the developments in this work generalizable to 
the multivariate case. Evidently, OR(x) > 0, with OR(x) = 1 implying conditional indepen- 
dence of R and S at X = x. 

For the developments in this paper, the following distributional assumption is made: 
Assumption 2.1. The sample of observations can be described by {(X k , Z k )}% =1 , which form 

22 

a sequence of i.i.d. replications of (X , Z) G Sx x {z G {0, l} 4 : ^ z % i = 1}, a random 

ij=U 

vector such that Z\X ~ Multinomial(l,p(X)), where p(x) = (pu(x),pi2(x),p2i(x),p22(x)Y. 



2.1 



underlies most cross-sectional 



22 2 2 

We use the shorthand Yl to denote ^2 ^2 . Assumption 

1.7=11 i=l j=l 

studies and surveys, as well as epidemiological studies consisting of a single cohort at baseline 



(see the ICU example). Qualitatively, Assumption 2. 1 states that for each cell (z, j), we observe 



a binary response vector coming from component (ij) of each Z k . Along with Xk, Nadaraya- 



Watson regression (NW, Wand and Jones 1995) can be used to estimate Pij(x) for all four 



cells. This estimator is a sensible one to choose since, being a locally weighted average, it 
automatically guarantees estimated probabilities between and 1, unlike local linear or P- 



4 



Spline estimators for instance. It also ensures maximum flexibility in the estimation of OR(x). 
For ij = 11, . . . , 22, we have 

p&Or) = £ W^ar, X fc )^' where W h (x, X k ) = K /£> (^r^) (5) 

k=l ^ ' k=l ^ ' 

where K(-) and h denote the kernel function and bandwidth respectively. For the latter, an 
optimal h is obtained by minimizing the asymptotic mean integrated square error (AMISE) of 
ffij(x). Defining p = J K 2 (x)dx and k 2 = J x 2 K(x)dx, then from standard kernel regression 
theory (Wa nd and Jones[[l995| ) we have 



^4^^ V /5 n- 1/5 , (6) 



'" " 1 ^lj Sx b%{x)f{x)dx 



where a^(x) = Pij(x)(l - Pij(x)), b^x) = p" j {x)/2 + p' ij (x)f'(x)/f(x), and f(x) is the 
marginal density of X. Although the theory suggests that we should use four different band- 



widths, one for each cell, it is argued in Geenens and Simar (2010 Section 2.3) that it is more 



appealing instead to use a single, common h for all cells, and this is what we will do here. 



Then, it was showed in the same paper that, if lim^oc \/nh 5 = A with < A < oo then 

"Snh(j%j(x) -Pij( x )) N ( K 2 Xbij{x), -^Pij(x)(l - Pij{x)) ) . 



/(*) 



For h ~ n -1 ' 5 , as suggested by (6), we have A > implying the distribution of vn/j(^(x) — 
Pij(x)) is not asymptotically centered at 0. To deal with this undesired feature, we choose a 
sub-optimal bandwidth h = o(n -1 / 5 ) ("undersmoothing", the bias is asymptotically negligi- 
ble and the Mean Squared Error is dominated by the variance) as suggested among others by 



Hall ( 1992). A common choice is to take h ~ n -1 / 4 , and indeed in this article our develop- 
ments will be exposed with this order of h in mind. Hence, the bias in the normality statement 
asymptotically vanishes and one obtains 



nh( Pij (x) - Pij(x)) ->• JV ( 0, -—pij(x)(l - Pij(x)) ) . 
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Utilizing the conditional multinomial As sumption |2 . 1 1 and the Cramer- Wold device, a vectorial 
version is finally obtained for p h (x) = (Pu(x),Pi 2 (x),P2i( x )iP22( x )) t: 

Vnh(&(x) - p(x)) -4 N (0, -^-(diag(p(x)) - p(x)p(x) t ) ) (7) 

V f( x ) J 

where diag(p(x)) denotes a 4 x 4 diagonal matrix with elements equal to the components of 

p(x). 

Now, a simple plug-in estimator of OR(x) is given by simple substitution of the NW condi- 
tional probabilities, 

Pl 2 {x)P2l{x)' 

Furthermore, an asymptotic (1 — a)% confidence interval (CI) for log(0-R (x)) can be obtained 
via the delta method on (|7]): 



\og(OR (x)) ±Zx- a / 2f 



(8) 



where Zi_ a /2 is the (1 — a/2) quantile of the standard normal distribution, and f(x) is estimated 
by the standard kernel density estimator 



k=l v 



X k - x 
h 



For simplicity, we use the same kernel K(-) and bandwidth h as in Pij(x), although this does 
not need to be the case. 

3 An amended estimator for \og(OR(x)) 
3.1 Motivation 

Although the plug-in estimator and associated CI are easy to calculate, they suffer from two ma- 

- — ~h 

jor drawbacks. First, like its classical unconditional counterpart, \og(OR (x)) may be severely 



biased in finite samples. This is confirmed in the simulations of Section 3.3 Second, if one 



6 



or more of the (x)'s are close to 0, then OR (x) will either be close to also or highly 



22 

inflated. Since ( 8 ) has asymptotic variance proportional to 1 /pij (x) , a small value for one 

ij=U 

of the p^ (x)'s also significantly enlarges the (estimated) variance, making confidence intervals 
of little use. To remedy these two problems, we propose adding a small deterministic value 
e(x) > to each p%j(x). This leads to an amended estimator 



lox(OR h (x)) = log ( + < x ))ffi*( x ) + < x » 



(9) 



We seek a value of e(x) for which \og{OR (x)) has asymptotically smaller bias compared to 

-h 

log(OR (x)). Although other methods of bias correcting an odds ratio estimator are available 



(see for instance, Wang, 1997, who use bootstrapping), these techniques are likely to produce 
similar statistical improvements compared to simply adding a small e(x), at the cost of greater 
computational intensity. Also, it is important to recognize that such an approach (adding a 
small deterministic value to each probability) has been taken before for OR. Specifically, we 
have the adjusted measure proposed by Haldane|( 1955) 



OR 



adj 



(PH+ 2k)(P22+ ^) 
(P12+ + h) 



(10) 



as a reduced bias estimator of OR. Furthermore, Walter and Cook ( 1991] ) compared several 
estimators of OR, and found \og(OR a dj) perform well with regards to bias and mean squared 



error. The form of ( 10) is insightful not only because it is analogous to OR (x), but it shows 
that the adjustment made was 0(n -1 ) i.e., the variance rate of the parametric estimators p^. 
This suggests it might be appropriate to select e{x) ~ {nh)~ l in our nonparametric setting i.e., 
the variance rate of the kernel based estimators. 



3.2 Choosing e(x) 

By applying a number of Taylor expansions and utilizing some standard kernel regression the- 
ory results on the moments of the NW estimator, we derived a general expression for the bias 
of the amended estimator ([9]), see Appendix [A] for relevant assumptions and proof. It turns out 



7 



that 



22 

Bias(log(OR h (x))) = h 2 K 2 V (-1)^ ( 



+ 0(e 2 (x)) + 0(/i 2 £(x)) + o((nh)- 1 ) + 0( £ 3 (x)) + 0(h 2 e 2 (x)), 



22 



where ^(x) is given below ([6]). From this, we propose two possible values of e(x) which, 
along with the plug-in estimator (estimator I, e(x) = 0), are summarized in Table [2j The first 



one is just e(x) = v / (2nhf(x)), evidently canceling out the second term in ( 1 1 ). The second 
one attempts to balance the first term also. Note that, with h ~ n~ l / A as we suggested in Sec- 
tion[2j the amendment e(x) = u /(2nhf(x)) (estimator II in Table|2]) only simplifies but does 
not explicitly reduce the asymptotic bias, as the first term in h 2 asymptotically dominates the 



second one in ( 1 1 ). In fact, for this to provide a definite asymptotic bias reduction, we would 
need h 2 = o((n/i) _1 ), which in turn requires h = o(n~ 1//3 ). Demanding such a bandwidth 
leads to a substantial amount of undersmoothing, to the extent that variance dominates and 
overwhelms any bias reduction achieved in the first place. This is to be avoided, and hence we 
maintain a reasonable amount of undersmoothing, driven by h ~ rr 1 ^. 
For six) to explicitly reduce the asymptotic bias in this case, we need e(x) ~ h 2 . With 



h 4 = o((nh) : ) (which is the case with h = o(n 1 ^ 5 )), one can then rearrange (11 ) to pro 



duce the second, more involved amendment, see estimator III in Table [2] We call the amended 



estimator using this second value of e(x) \og(OR (x)), to distinguish it from the previous one. 
Despite estimator III being one which actually produces an asymptotic bias reduction, we in- 

h 

stead advocate the simpler amendment e{x) = u /(2nhf(x)), and thus \og(OR (x)) as the pre- 
ferred estimator of the pointwise log odds ratio. The reasons for this are four-fold: 1) the adjust- 
ment e(x) = v /(2nhf(x)) has a simple form and interpretation. Intuitively, it is a straight non- 



parametric analog of the l/(2n) adjustment in (10); 2) the amendment e(x) = is /(2nhf(x)) 



is very simple to compute. In contrast, to calculate log(Oi? (x)), one needs to estimate the 
bias terms bij(x) = p'- j (x)/2 + p'ij(x)f'(x)/f(x). This could be done by plugging in kernel 
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estimates of the derivatives ( |Rodriguez-Campos| 1 1999[ ), using local cubic smoothing ( |Fan and 



Gijbels, 1996|, or via bootstrapping (Rodrigu ez-Campos and Cao-Abad[[l9"9~3~] ), although all of 



these methods are challenging to implement; 3) Unlike with e(x) = u / (2nhf(x)), there is 



no guarantee of OR (x) > 0, especially after substituting in the unknown quantities; 4) we 



demonstrate empirically in Section 3.3 that, in finite samples, log(Oi? (x)) and \og(OR (x)) 
are similar with regards to bias, but the latter always has lower mean squared error (MSE). 
It is interesting to point out that our discussion of choosing e(x) somewhat mirrors discussions 
regarding the two mainstream methods for dealing with bias in nonparametric regression proce- 
dures: undersmoothing (Halll 1992 ) and explicit bias correction ( Neumann} [T9 95 ). In estimator 
III, one would be making an explicit bias correction, whereas adopting e(x) = u /(2nhf(x)) 
with h = o(n -1 / 3 ) is analogous to the approach of undersmoothing. By choosing estimator II 
but keeping h ~ n -1 / 4 , we actually promote a hybrid approach which balances the two. 
As a final note, with the general expression for the bias given by ( [TT| ) and the asymptotic 
variance used for constructing ([8]), we can derive an expression for the AMISE of the plug-in 

~h 

estimator \og(OR (x)). From there, it can be seen that, for the purpose of estimating the point- 
wise log odds ratio, the asymptotic optimal bandwidth should be h ~ n^ 1 ^ 5 , same as the order 
of the optimal bandwidth when estimating the functions Pij themselves. This offers theoretical 
justification for using a single undersmoothed bandwidth h ~ n _1//4 all over. 



3.3 Simulation study 1 - Bias and mean squared error 
3.3.1 Design 

We conduct a simulation study to compare the three estimators shown in Table [2] in terms of 
their bias and MSE. We also compare them to two model-based estimators: 1) an estimate of 
\og(OR(x)) based on the logistic regression of equation §2^, and given by OR(x) = e^ 2 " 1 "^ 3 ^; 
2) an estimate based on fitting the GAM model ([3]). The was done using the mgcv package in 
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R (Wood 



2006). Three simulation models were designed: 



X ~ Unif [-2,2] 
Pl (x) = 0.07e- x2 + 0.47 p s {x) = 0.1/(1 + e x ) + 0.45 
Pn(x) = pi{x)p.i(x) + S(x) pi2{x) = P\.{x)p. 2 {x) - 8{x) 



with 



Model A: 8{x) = 0.05e-°- 3x 

Model B: S(x) = 0.25 - <j)(x; -1, 1.8 2 ) 

Model C: 5(x) = 0.25 ((1 + e" 6 ^" 1 - 0.5) 

where <f)(x; /i, a 2 ) denotes the density of a normal distribution with mean [i and variance a 2 . The 
above design can be interpreted as follows: if 5(x) = 0, then Pij(x) = Pi.( x )p.j( x ) Vz, j = 1,2 
and thus \og(OR(x)) = 0. The delta function controls the degree of local association, as 
shown in Figure [TJ which depicts the true log odds ratios curves for the three models. The 
design of our models, in particular our choices of 5(x), are such that the shapes of \og(OR(x)) 
are representative of commonly encountered non-linear relative risk functions in epidemiology 
( |Zhao et arj [l996), whilst encompassing a realistic range of values. 



We assessed performance using empirical integrated absolute bias and MSE, calculated by first 
working out the pointwise absolute bias and MSE in increments of 0.05 from x = —1.75 to 
x = 1.75, then averaging over all the increments. It is essential to take the pointwise absolute 
bias i.e., ignore the sign, so that when averaging to produce the integrated bias, these values 
do not cancel each other out due to symmetry. Also, even though the full support of x is from 
-2 to 2, we limit ourselves to the interval (-1.75,1.75) to avoid boundary bias (Fan and Gijbels, 



1996). Sample sizes n = 50, 100, 250, 1000 were considered, with 4000 simulated datasets for 



each n. 

For the nonparametric estimators I-HI, a Gaussian kernel was used with bandwidth selected 



via direct plug-in ( |Rupert et a l 1995) plus "manual" undersmoothing (multiplying the optimal 



10 



bandwidth by n -1 / 20 so as to get a bandwidth proportional to n -1 / 4 , as it is commonly done). 
Strictly speaking, a Gaussian K(-) is not compactly supported on [—1, 1], although a slight 
technical argument can be included to make the results above hold for such choice (Col lomb[ 



1976). For estimator III, the NW bias terms hj(x) were estimated via the binary bootstrap 



(Rodriguez-Camp os and Cao-Abad[[T9"9~3j equation (6)) 



3.3.2 Results 

In all models, estimator I performed poorly at the two smaller sample sizes (Tables |3]j4]). By 
amending the estimated probabilities as in estimators II and III, the integrated bias was signif- 
icantly reduced (Table [3]). Expectedly, estimator III produced the smallest integrated bias in 
most configurations, although estimator II also performed quite competitively. 
A major problem suffered by estimator III was that sometimes the estimates of the odds ratio 
turned out to be negative. In Model B at n = 50, there were 1361 cases (out of 71 x 4000 = 



284, 000) where OR (x) < 0. This occurrence of negative values was not resolved at larger 
sample sizes e.g., in Model C at n = 1000, there remained 105 cases of invalid estimates. In 
contrast, estimator II cannot suffer from this problem, obviously. 

The shape of the true log odds ratio curves in Models B-C (see Figure [TJ) meant there was a 
clear mis-specification of mean structure in fitting ([2]). Therefore, the GLM -based estimator 
suffered from inflated bias even at large n (Table [3]). In contrast, the flexibility of kernel re- 
gression allowed estimators II and III to perform much better than its parametric counterpart. 
The performance of the GAM-based estimator was somewhere in between the GLM model 
and the kernel-based estimators II and III. This is expected, given the 'hybrid' nature of the 
GAM-based estimator between the purely linear- logistic expression in ([2]) and the entirely non- 
parametric kernel-based methods. 

Although its bias was higher compared to estimators II and III, the GLM-based estimator per- 
formed best with regards to MSE in Model A. We found however that this was largely due to 



the inadequacy of using the direct plug-in method (Rupert et al 1995 ) to select the bandwidth 



for NW regression. For relatively flat functions like Model A, direct plug-in often leads to 



significant undersmoothing ( Signorini and Jones , 2004). To investigate this, we re-calculated 
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nonparametric estimators I-III in Model A, using the same 4000 simulated datasets at each n, 



but this time estimating h via cross-validation (Hardle and Marron 1985). Results showed 
that for all three nonparametric estimators, there was a sizable decrease in integrated MSE (see 
Supplementary Material). Moreover, the decrease is such that estimator II actually had a lower 
integrated MSE than both the logistic regression and GAM estimators at all four sample sizes. 
Comparing cross-validation and direct plug-in, we found that the average h based on the former 
was roughly five times larger than for the latter. 

For Models B-C, estimator II had the lowest MSE for all sample sizes (Table [4]). Although es- 
timator III marginally outperformed II with regards to integrated bias (Table [3]), the complexity 



and additional variability resulting from \og(OR (x)) meant that it was the latter which had 
the lower MSE. 

• h 

In conclusion, the simulation results presented here lead us to recommend using log(Oi? (x)) 
as a preferred estimator of the pointwise log odds ratio. Unless stated otherwise, future refer- 
ences to e(x) will admit the definition e(x) = u / (2nhf(x)) only. 



4 Confidence intervals 

Given the strategy of adding that small value e(x) to the conditional probabilities, a first attempt 

— h fl 
at constructing confidence limits based on \og(OR (x)) would be to adjust ( 8 ) in an analogous 

manner, 

' h Un v — - l \ 

' v * (12) 



\og(OR (x))±zi- Q 



12. 



nhf h (x) ij=11 Pij{x) + 2nh J h[x) 
The form above is simple to work with, and parallels the variance formula discussed in |Agresti| 



(2002 



Section 3.1.1) for \og(OR a dj) in ( 10). However, although we expect this to work better 
than ([8]), the use of resampling methods may offer even further improvements on this asymp- 
totic result in regards to coverage probability and/or interval width ( Horowitz] 2001 ). Therefore 



we explore this below. We also recognize that the delta method could have been applied directly 

h 

to \og(OR (x)), but we found this led to a very complex formula for the asymptotic variance, 
and so have avoided it here. 

To obtain bootstrap based confidence intervals, we propose a new resampling procedure called 



12 



the multinomial- 1 bootstrap, inspired by some ideas in |Rodriguez -Campos and Cao-Abad 



( 1993) and developed in Hui and Geenens (2012) 



Consider cell in our 2x2 table, for which we have a binary response Z% and its 

corresponding covariate X k , k = 1, . . . , n. For resampling methods to work here, two re- 
quirements need to be satisfied: 1) the bootstrapped response variables Z* lj must be binary 

22 

and satisfy ^ Z* l i = 1; 2) we must capture the conditional nature of the probabilities 

ij=U 

Pij(x) = P(Z^\X = x). The multinomial- 1 bootstrap therefore works by the following: 
first, estimate Pij(x) with ^ using a pilot bandwidth g (instead of h) to obtain the vector 
jF(x) = (p^i(x),Pi 2 (x),P2i(x),p22(x)) t . Then for k = 1, n, we simulate a bootstrap 
response vector Z* k = (Z* k u , Z* k 12 , Zf\ Zf 2 ) 1 from 

Zl ~ Multinomial (l,pP(X k )) . 

Having obtained the bootstrap sample (Xk, Zl), we re-perform kernel regression using the pre- 
vious h ~ 77T 1 / 4 to obtain p*- l (x) and hence the vector p* h (x). Use of an initial oversmoothed 



g is typical when bootstrap is used in nonparametric regression (see for instance, Hardle and 



Marron, 1991 ), and is necessary to properly account for the bias inherent in kernel regression. 



A pilot bandwidth g ~ n ^ 9 has been proved to be optimal in that purpose, and this is also 



what we will use in this work. By extending the theory of Rodriguez-Campos and Cao-Abad 



( |1993 ), it may be shown that the multinomial- 1 bootstrap produces a consistent estimator of 



— h I I 

\og(OR (x)) (see Appendix A ). Percentile bootstrap confidence intervals based on \og(OR(x)) 

are thus obtained by generating a sufficiently large number of bootstrapped datasets, and cal- 

/ */l Q \ 

culating (a/2) and (1 — a/2) quantiles of I \og(OR (x)) — \og(OR (x)) ). Denoting these 



quantiles by l*(x) and u*(x) respectively, a 100(1 — a/2)% bootstrap confidence interval for 
\og(OR(x)) is given by 

(\og(OR h (x)) - u*(x), \og(OR(x)) - l*(xj\ . (13) 
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4.1 Simulation study 2 - coverage probabilities 



We compare the three confidence intervals for \og(OR(x)), as represented by ([8]), ( 12 ) and ( 13 ), 
in terms of their empirical coverage probability (ECP) and mean length (on a log scale). ECP 
is defined as the number of times the true pointwise log odds ratio lies within the generated 
CIs (nominated level 95%), divided by the total number of replications. We used Models A-C 



established in Section [331 w i m n = 50, 100, 250 and 1000 simulated datasets for each n. CIs 
were calculated at values x = —1, 0, 1.5. For the bootstrap CIs, we used B = 500 resamples. 
Initially we tested B = 1000, but found 500 replications produced similar intervals. The results 
are shown in Tabled 

For all three models, the delta method procedure based on the plug-in estimator I (DM-I) lead 
to conservative CIs i.e., high ECP for n = 50 and 100. DM-I also had the widest confidence 
intervals for all sample sizes. Such wide intervals (on a log-scale) will be of little use to the 
applied researcher when attempting to determine a realistic range of values for the true OR(x). 

h 

Applying the delta method to \og(OR (x)) (DM-II) lead to CIs with much smaller interval 
lengths, without any consistent decrease in ECP. The bootstrap percentile CIs (M1B-II) per- 
formed best, having almost always the smallest interval lengths with similar ECP. For locations 
where \og(OR(x)) was substantially different from e.g., Model B x = 1.5 and Model C at 
x = —1, 1.5, bootstrap based intervals offered useful decreases in average CI length without 
being further away from the nominated 95% coverage probability. Specifically, while the delta 
method intervals tended to have ECP > 95%, the bootstrap CIs often have coverage slightly 
below 95%. This could be blamed in the name of conservatism, however, the absolute devia- 
tions from the targeted level 95% were very similar between the two methods. For n = 250, 
both DM-II and M1B-II performed equally well with regards to ECP and interval width. 



5 A real-data application 

We illustrate the application of the methods developed to the ICU dataset discussed in Section 
[T] We are interested in exploring how the strength and direction of the association between 
patient survival following hospital discharge and type of admission varies with the age of the 
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patients. To begin, a Pearson \ 2 test on Table [I] provided strong evidence against global inde- 
pendence (p-value = 0.001), and the global odds ratio estimate OR = 8.89 indicated that the 
odds of dying from an emergency admission was almost 9 times that for an elective admission, 
and could be as high as 38 times (95% Wald CI: [2.064;38.290]). Although this conclusion is 
expected, it should be subject to further investigation, particularly in light of the hypothesis that 
the strength of this association may be weaker for young adults. 

We first approached this investigation using logistic regression, with results indicating the main 
effect of admission type was significant given age (jo-value < 0.001). The interaction term 
between age and admission type however was not significant in this model (p-value = 0.622), 
meaning the odds ratio, despite being significantly greater than 1 (e 2 983 w 19.747), did not ap- 
pear to vary with age. Persisting with the interaction model, the log odds ratio estimate actually 
shows a decline with increasing age (Figure [2]- solid line). We also fitted a GAM model, with 
penalized regression splines and penalty chosen via GCV, using the "by" argument available 
in the mgcv package ( |Wood[[2~006[ ). The resulting log odds ratio curve closely follows the fit 
from logistic regression (Figure [2]- dotted line). 

As an alternative to model-based approaches, we decided to use the pointwise log odds ratio 

— h n. 

estimated using \og(OR (x)) (estimator II in Table 2). The result plotted as the dashed curve 
in Figure [2} From ages 50 and 86, log(OR (x)) hovered around 2.5 which, in reasonable 
agreement with logistic regression, provided strong evidence for the odds of death for pa- 
tients discharged from an emergency admission being significantly higher than those released 

• h 

from elective admission. However, for ages less than 50, \og(OR (x)) drops to become non- 
significant. This is in contrast to both the logistic regression and GAM models which were not 
able to provide any notion of this dampening. 

To further verify whether this decrease is substantiated, 95% pointwise bootstrap confidence in- 
tervals (B = 1000) were calculated at ages 30, 50 and 70. At both ages 50 (CI: [1.607;4.157]) 
and 70 (CI: [0.260;3.076]) the limits were above 0, and confirmed that for older patients the 
odds of death was significantly higher for patients admitted for emergency reasons. However, 
for age 30 (CI: [-1.633;1.394]) the confidence interval contains \og(OR(x)) = 0, and indicated 
that for younger patients, there is no strong evidence to suggest type of admission into ICU 
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affects the odds of survival. 



6 Concluding remarks 

In this paper, we developed a new measure of local association by extending the standard odds 
ratios using conditional probabilities, and estimating these probabilities nonparametrically us- 
ing kernel regression to allow maximum flexibility. Three estimators of \og(OR(x)) were 

h 

proposed, from which we recommend the amended estimator \og(OR (x)), which is both sim- 
ple to calculate and has good bias/MSE properties. We formulated confidence intervals based 

h 

on \og(OR (x)) using both asymptotic arguments and an innovative multinomial- 1 bootstrap 
procedure. 

One particular issue we did not explore is bandwidth selection for our estimators of log(OR(x)). 
For kernel regression in general, there is no single best method of selecting the bandwidth. The 
direct plug-in method tends to perform well for estimating the functions p^- in practice in many 
cases (Signo nni and Jonesj |2004[ |Rupert et aH|1995[ ), which is why we chose it for this work. 



However, there is no real guarantee that it would perform as well for estimating our point- 
wise log odds ratio. Consequently, further studies need to be conducted evaluating various 
approaches of choosing h in this very setting. Indeed, the results from the first simulation in 



Section 3.3 provide clear evidence that a thorough comparison of the various methods in se- 
lecting h is necessary. 

In the future, we hope to develop model-free nonparametric association measures beyond the 
pointwise odds ratio e.g., pointwise relative risk, pointwise Kendall's tau and so on. How con- 
fidence intervals can be established for these quantities is also of interest. Finally, the use of 
kernel regression means that due to the curse of dimensionality, it is inefficient to produce a 
pointwise odds ratio which is 'local' with respect to many covariates. Perhaps the use of semi- 
parametric methods e.g., single index models, to estimate the conditional probabilities instead 
can overcome this problem. 
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Tables and Figures 



Table 1: Dataset of 200 patients discharged from an adult ICU, classified according to survival status and type 
of admission. 

Status 
Died Lived 

Emergency 38 109 

Admission 

Elective 2 51 



Table 2: Summary of the three kernel based estimators for the pointwise log odds ratio proposed in this work. 
Estimator Notation Amendment 

I \og(OR(x)) e(x) = VxeSx 

II \og(OR(x)) e(x) = is /(2nhf(x)) 

*) 

X) 



III \og(OR (x)) e(x) = - h?K 2 



22 

E (-i) 1 ^!^ 
»j=ii \ Vi ^' 

E (-i) l+J Y —*t^\ 
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Table 3: Integrated absolute bias of the three nonparametric (1,11 and III), the GLM, and the GAM estimators of 
log(OR{x)) for Models A-C at various samples sizes n. The best estimator is each configuration is highlighted in 
bold. 



Model 


n 


I 


II 


III 


GLM 


GAM 




50 


0.465 


0.036 


0.016 


0.102 


0.365 




1 no 




U.U11 




U.UDJ 




A 
















250 


0.036 


0.001 


0.012 


0.044 


0.030 




1000 


0.012 


0.006 


0.006 


0.040 


0.011 




50 


1.200 


0.111 


0.094 


0.408 


0.463 




1 no 


\J.J ID 




U.U'tO 


U.JO'4- 


U.Zl o 


B 
















250 


0.075 


0.024 


0.027 


0.357 


0.082 




1000 


0.034 


0.020 


0.020 


0.354 


0.053 




50 


0.879 


0.212 


0.194 


0.637 


0.541 




100 


0.291 


0.113 


0.100 


0.608 


0.415 


C 
















250 


0.124 


0.093 


0.088 


0.607 


0.264 




1000 


0.076 


0.066 


0.061 


0.607 


0.163 
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Table 4: Integrated MSE of the three nonparametric (1,11 and III), the GLM, and the GAM estimators of 
log(OR{x)) for Models A-C at various samples sizes n. The best estimator is each configuration is highlighted in 
bold. 



Model 


n 


I 


II 


III 


GLM 


GAM 




50 


6.152 


1.265 


1.386 


0.893 


1.076 




1 on 


1 1 Ofi 


o 7^n 

u. / JU 


O 707 
U. lyZ 




U.O'+O 


A 
















250 


0.323 


0.293 


0.309 


0.127 


0.197 




1000 


0.077 


0.075 


0.079 


0.042 


0.045 




50 


17.496 


1.209 


1.308 


1.238 


1.863 




1 on 


7 A7Q 


O 748 


n 800 


O 8^1 
U.oJ 1 


1 116 

1 . 1 1 o 


B 
















250 


0.398 


0.292 


0.311 


0.300 


0.293 




1000 


0.096 


0.053 


0.057 


0.192 


0.058 




50 


11.516 


1.311 


1.467 


2.117 


2.041 




100 


2.062 


0.828 


0.953 


0.963 


1.263 


C 
















250 


0.459 


0.350 


0.428 


0.649 


0.428 




1000 


0.132 


0.079 


0.130 


0.534 


0.246 



22 



CQ 



cq 



.P fi 



CQ 



5 Q 



00 ON in 

h 1; m 

it cn cn 



it O it 
CM it it 
ON On ON 

odd 



ON O CN 
NO NO it 
ON ON ON 

odd 



It 
cn 



in 
«n 

ON in* CN 



O it 00 
00 00 it 
ON On ON 

odd 



NO «0 it 
CN CO t-h 

it cn cn 



no no 

N lO 
ON On ON 

odd 



NO O it 
f-; It 

it cn 1 cn 



it CN it 

r~ in 

ON On ON 

odd 



t lO ON 
00 00 i-h 

no cn cn 



00 O CN 

r- oo in 

ON On ON 

odd 



t> ON 
it CN O 

it cn cn 



*t O CN 

it no m 

ON On ON 

odd 



cn oo on 
CN CN O 
it cn cn 



CN it it 
* in in 

ON ON On 

odd 



OO NO CN 

oo in t — 

ON ON ON 

odd 



cn On in 

p in — ; 

no cn cn 



00 CN NO 

oo in no 
On On On 

odd 



NO ON 

cn cn p 
it cn cn 



it NO 00 

it in it 

ON ON On 

odd 



cn o 
r- cn — i 



oo It it 
r~ no 

ON ON On 

odd 



CN CN in 
cn no 
no cn cn 



NO CN 00 

oo t— in 

ON ON ON 

odd 



in o 
it cn — < 

it cn cn 



o o it 
in in in 

ON ON On 

odd 



in 


cn 


S\ 
o 


o 

oo 


OS 
NO 


Co 


NO 


cn 


<N 


NO 


cn 


CN 


00 

r- 

ON 


CN 

r-- 

ON 


CN 

in 

ON 


OO 
OO 
On 


oo 

NO 
On 


o 

m 

ON 


d 


d 


d 


d 


d 


d 


o 
m 


o 
o 


o 
m 

CN 


O 

in 


o 
o 


o 
in 

CN 



on cn cn 
cn t in 
it cn CN 



NO OO NO 

CN cn it 

ON ON ON 

odd 



o in \t it ^ o ino^n 
N oo ^ no cn — ; -t q in 

in'cncN ^tcn'cN in it CN 



CN no oo 
r-~ no in 

ON ON ON 

odd 



it O cn 

Tj- NO 

on in cn 



oo o t 
p- oo in 

ON ON ON 

odd 



o NO o 

0O ^ rH 

it cn CN 



it 1 CN 

cn no in 

ON ON ON 

odd 



it NO 
oo in 



cn o CN 
oo oo in 

ON ON ON 

odd 



on r~ in 

O NO 

t-- cn cn 



O NO o 
oo p- in 

ON ON ON 

odd 



CN 



NO ON 
f- NO 



■<t cn CN 



CN CN CN 
CN it it 
ON ON ON 

odd 



m 


oo 


It 


On 


OO 


« — I 


ON 




CN 


NO 


CN 


p 


NO 


cn 




00 


in 


OO 


it 


cn 


CN 


it 


cn 


CN 


in 


it 


Ci 




















NO 


o 


CN 




o 


CN 


CN 


CN 


it 


r- 


oo 


m 


OO 


r- 


in 


m 


in 


m 


ON 


ON 


ON. 


On 


On 


ON 


ON 


On 


On 


d 


d 


d 


d 


d 


d 


d 


d 


d 



i r- no 
cn 



NO O 

no r- 

ON ON 

d d 



es ° ° 

2 o in 

10 -h CN 



CD 

I 



-a 
o 
*3 



a 
u 
u 

Jh 
CO 
Oh 



23 



Figure 1: True pointwise log odds ratio \og(OR(x)) as a function of x for the three simulation models. 




Figure 2: Local log odds ratio of death for patients admitted to ICU for emergency reasons relative to those 
admitted for elective reasons, plotted against age. Plotted are the estimates from a logistic regression model fitted 
with an interaction effect (solid line), a GAM fit (dotted line), and pointwise log odds ratio based on estimator II 

h 

\og(OR (x)) (dashed line). A horizontal line at log(OR(x)) = marks local independence. 
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A Proofs 

We begin by revising some standard results of kernel regression theory, which have been 
adapted into our context of a 2 x 2 contingency table. The following regularity assumptions are 
made: 

24 



Assumption A.l. The functions Pij(x), i,j = 1, 2, are bounded away from and 1. Also, the 
marginal density of X, f, is bounded away from on its compact support, Sx- All functions 
Pij(x) and f are assumed to be four times differentiable on Sx- 

Assumption A.2. The kernel K(-) is a probability density junction symmetric about with 
compact support on [—1, 1]. 

Assumption A.3. The common bandwidth h = h n satisfies h — > and nh — > oo as n — > oo. 
In addition, to avoid the differing behavior kernel regression has near the boundary space of X 



(Fan and Gijbels, 1996), S x is reduced to an interior support S x = {x G Sx '■ lx + h < x < 
ux — h} where lx and ux are the lower and upper bounds of Sx- Following this, we have the 



following adapted from Wand and Jones ( 1995 1 



Theorem A.l. Under Assumptions 2.1 and A.l A.3 it holds = 1,2 and x G S x that 



Efflux)) = p^ + ^^b^ + Oih 4 ) 



Varif^x)) 



nhf(x 



-Pij(x)(l -Pij{x)) + o({nh) 



Although already stated in the main body of the paper, we recall here the following result: if 

lim^oo y/nh 5 = X with < A < oo then 



/lh (Pij( x ) ~Pij( x )) ^ N ( K 2 \b i:j (x), j^p i:j {x){l -Pij(x)) j . 



As explained in Section |2} we treat the bias term via under smoothing, and we thus replace 



Assumption A.3 by 



Assumption A.4. The common bandwidth h = h n satisfies nh 5 — > and nh — > oo as n — > oo. 



The results of Theorem |A. 1 are unchanged under this assumption, but the bias in the normality 
statement asymptotically vanishes and one instead obtains 



nh(p tj (x) - pij(x)) — >■ iV ( 0, j^pij(x)(l - pij(x)) 
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and its vectorial version 



nh{r? L (x) — p(x)) AiV(o, ° (diagipix)) — 

fix) 



(14) 



where diag(p(x)) denotes a 4 x 4 diagonal matrix with elements equal to the components of 

p(x). 

A.l A General Expression for Bias(log(OR (x))) 

We begin by evaluating ^(log(j5^ (x) + s(x))). To clarify, e(x) is a function of x but inde- 
pendent of i, j i.e., the same value is added to each of conditional probabilities. We also want 
e(x) — > as n — > oo, since the bias of p+Ax) becomes negligible at large n and there becomes 
less of a need to adjust for it. Rewriting it as follows, 



\og{$Ax) + e(x)) = \og(pij(x)) + log I 1 + 



p t l j (x)+e(x) -Pij(x) 
Pij(x) 



(15) 



then we need only consider the second term. Denoting rh(x) = {j^,j{x) + e{x)—pij{x))/pij{x), 
we have the following lemma regarding its moments. 



2.1 


A.l 


A.2 


and 


A.4 



ife(x) — > then 



E{r^x)) 
E{{r^x)Y) 



— l -- (e(x) + h 2 K 2 b ij (x)) + o((nh) r ) 

u 1-Mx) + _1 ( £ 2 {x) + 2h h(x) K2 b ij (x)) + o((nh) 
nhf(x) Pij{x) Pij{x) 2 v ' 

e 3 (x) + 3e 2 (x)h 2 K 2 b ij (x) + o((n/i) _1 ). 



Proof. The first and second statements follow immediately from Theorem A.l The third mo- 
ment follows from a cubic expansion E (p^j(x) + e(x) — Pij(x)) 3 = E{{j^j{x) — Pij(x)) 3 ) + 
?>e{x)E{(j^-{x) — pij(x)) 2 ) + ?>£ 2 {x)E(j^-{x) — Pij(x)) + e 3 (x), and utilizing the result from 



Geenens and Simar 



(2010) that for h = o^- 1 / 5 ), E(($Ax) -^-(x)) 4 ) = 0((nh)~ 2 ) which 



implies EdffAx) - Plj (x)\ 3 ) = 0((nh)-' 3 / 2 ) = o^uli)- 1 ). 



□ 
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The above result can be combined with the general formula for the log amended estimator, 
given by ([9]) in the main text, to produce the following: 



Lemma A.2. Under Assumptions 
then 



2.1 



A.l A.2 



and A.4 



it holds for x G that if e{x) — > 



22 



E(\og{OR (x))) = \og(OR(x)) + h 2 n 2 (-l)* 7 ' 

ij=n 

22 



+ \e{x) 



2nhf(x) 



Pij(x) 

)|Mr + 't-y 



+ 0(e 2 {x)) + 0{h 2 e(x)) + o((nh)- 1 ) + 0{e\x)) + 0{h 2 e 2 {x)). 



Proof. Writing E(\og{OR (x))) = ^J =11 (-l) i+J ^(log(^(x)+e(x))), we can M5e (|75|) 



to^znd 

22 22 

E(log(OR(x))) = logfei(x)) + ]T (-l) t+ ^(log(l + rg(x)) 

ij=ll ij=U 
22 

= log(Oi2(z)) + ("l) l+ ^(log(l + (16) 

ij=U 

Next, we apply a Taylor expansion log(l + rh(x)) = t^-(x) — (7^(x)) 2 /2 + R(tMx)) where 
the remainder term can be written as 



R{rUx)) 



3(l + 07§(*))3 



for some 9 E (0, 1). Ifr^(x) > 0, then 



< ^(i?(^(x))) < E ( ] < E{(j^x)f). 



We also know fMx) — >■ in probability, as p^(x) is a consistent estimator of Pij(x) and 
e(x) -> 0. Thus, for t^Ax) < 0, we can a/so write, provided n is large enough, 



E{{rMx)f) < E 



< E 



T, 



3(l + fg(x))3; - \3(l + ^.(x))3 
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where the first inequality holds because for z negative but not too far away from 0, we have 
z 3 < z 3 /(3(l + zf). Hence, 



\E{RftAx)))\ < \E((T^x)f)\ = 0(e 3 (x)) + 0(h 2 s 2 (x)) + o((n/ i )- 1 ) (17) 



as n — > oo, from Lemma A. 1 Now, from the Taylor expansion, we get 



£(log(l + ?i{x))) = E(t§(x)) - -E((r^(x)) 2 ) + E{R{t*{x))) 



and using Lemma A.l again and (17) it follows 



1 r / x ,9 . . \ 1 Un 1 — Pa (x) 

(e(x) + h 2 K 2 b lJ (x)) - 



Pij[x) - ' 2nhf(x) Pij(x) 

+ 0(e 2 (x)) + 0(h 2 e(x)) + o{{nh)- 1 ) + 0(e 3 (x)) + 0(h 2 e 2 (x)) 



as n — > oo. Plugging this into (16) yields the announced result. 



□ 



A.2 Validity of the Multinomial- 1 Bootstrap 

We begin by trying to mimic via bootstrap the asymptotic normality statement of p%j(x) as 
formulated in (f7]). For the pilot bandwidth the following assumption is admitted: 

Assumption A.5. The common pilot bandwidth g = g n is to be taken asymptotically larger 
than the optimal bandwidth h opt , that is, h opt = o(g). 

One can see that with h opt ~ n' 1 ^ 5 , choosing g ~ rT 1 ^ as we did in the main work satisfies 
this. The main result of applying multinomial- 1 bootstrap procedure described in Section [4] is 
encompassed in the following theorem appropriated from Rodriguez-Camp os and Cao-Abad 
( fT993| ). 



2.1 


A.1 


A.2 


A.4 



J x 



1, 2, it holds that 



P ( Vnh^ix) - Pij (x)) <z)-P*{ y/nh^{x)-%(x)) < z 
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Ao 



where P*(-) denotes the bootstrap probability conditional on the original dataset. 



Note that the support for X was thinned slightly from S x to S 9 X = {lx + 9,ux — <?}• Given 
\og[OR (x)) is merely a continuous function of Pij(x), it therefore suffices to propose the 
following: 



Theorem A.3. Under Assumptions 2.1 A.l A.2 A.4 and A. 5 V z G R it holds that 



P ( Vnh ( \og(OR(x)) - \og(OR(x))) < z) - P* (V^h (\og(OR* h (x)) - \og(OR 9 (x)) ) < z 



where 



-*h 



\og(OR (x)) = log 
\og(OR 9 (x)) = log 



(fi^x) + e(x))(p* 2 h 2 (x) + e(x)) 

(&(*)+ e(s))G% 2 (s)+e(s)) 
(ffi 2 (x) + e(x)W 21 (x) + s(x)) 



Proof. See that we can write 

\og(OR k (x)) - \og(OR(x)) = \og(OR h (x)) - \og(OR h (x)) + \og(OR h (x)) - \og(OR(x)) 

= (log(4(x) + e{x)) - log(^-(x)) + log(p£(x)) - log^Or)} 

ij 

+ 0{s\x)) + P m ] {x)- Pl] {x)f) 

from suitable Taylor expansions. Given e(x) ~ (nh)~ l and p^Ax) — Pij(x) = Op((rz/i) -1 / 2 ), 
it follows 

nh ( \og(OR h (x)) - log( OR(x)j) = v^Vf-lj^ ^^'^'^ + OpiinK)- 1 ' 2 ). 

K J if PiA x ) 
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Similarly, 



\og{OR h {x)) - \og{OR 9 \x)) = logoff (x) + e(x)) - logffl^x) + s(x)) 

ij 

= E(- i ) i+3 ' ^M^? + o P m(x) - %(x)n 

*rf pl^x) +e(x) J 

As Pij(x) + e(x) — > Pij(x) as n — > oo in probability, we get that the limit bootstrap distri- 

, — / */i ^ \ 

bution of ynh I log(0/2 (x)) — \og(OR (x)) J (i.e. the distribution conditional on the initial 

sample) is the same as the limit distribution ofy/nh (\og(OR (x)) — \og(OR(x))), using The- 
orem 



CO □ 



Note the same e(x) = u /(2nhf(x)) is used in the definition of OR 9 (x), although a correction 



using the bandwidth g seems more natural there. However, under Assumptions |A.4| and |A.5[ 
vqI (2ngf(x)) converges to faster than u /(2nhf(x)), and the stated result is not affected by 
this choice. We therefore prefer using the same e(x) throughout for simplicity. 



B Supplementary Material 

B.l Results of Integrated MSE for Model A using bandwidths estimated 
via cross-validation 

Table 6: Integrated MSE of the three nonparametric (1,11 and III), the GLM, and the GAM estimators of 
log(OR(x)) for Model A at various samples sizes n. The best estimator is each configuration is highlighted 
in bold. Results for the integrated bias were very similar to original results presented in Table [3] and therefore are 
not reproduced below. 

Model n I II III GLM GAM 

50 1.090 0.494 0.719 0.893 1.076 

100 0.517 0.243 0.366 0.344 0.646 

A 

250 0.230 0.113 0.182 0.127 0.197 

1000 0.075 0.038 0.040 0.042 0.045 
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